Human computation scaling for measuring meaningful latent traits in political texts

نویسندگان

  • Jacob M. Montgomery
  • David Carlson
چکیده

Scholars are increasingly interested in measuring latent political concepts embedded in written or spoken records. After all, most important political behaviors and outcomes are encoded in language. However, current approaches of turning natural language into meaningful measures are sometimes unsatisfying, relying on either costly and unreliable human coding or automated methods for document classification that miss subtleties of language easily identified by human readers. In this paper, we develop and validate an innovative “human computation” method for encoding political texts that preserves much of the reliability of automated methods while leveraging the superior ability of humans to read and understand natural language. We validate the method with online movie reviews, open-ended survey responses, advertisements for U.S. Senate candidates, and State Department reports on human rights. The framework we present is quite general, and we provide software to help researchers interact easily with online workforces to extract meaningful measures from texts. ∗We thank Burt Monroe, John Freeman, and Brandon Stewart for providing comments on a previous version of this paper. We are indebted to Ryden Butler, Dominic Jarkey, Jon Rogowski, Erin Rossiter, and Michelle Torres for their assistance with this project. We particularly wish to thank Matt Dickenson for his programming assistance. We also appreciate the assistance in the R package development from David Flasterstein, Joseph Ludmir, and Taishi Muraoka. We are grateful for the financial support provided by the Weidenbaum Center on the Economy, Government, and Public Policy. Finally, we wish to thank the partner-workers at Amazon’s Mechanical Turk who make this research possible.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Validating Estimates of Latent Traits From Textual Data Using Human Judgment as a Benchmark∗

Automated and statistical methods for estimating latent political traits and classes from textual data hold great promise, since virtually every political act involves the production of text. Statistical models of natural language features, however, are heavily laden with unrealistic assumptions about the process that generates this data, including the stochastic process of text generation, the...

متن کامل

Estimating Uncertainty in Quantitative Text Analysis∗

Several methods have now become popular in political science for scaling latent traits— usually left-right policy positions—from political texts. Following a great deal of development, application, and replication, we now have a fairly good understanding of the estimates produced by scaling models such as “Wordscores”, “Wordfish”, and other variants (i.e. Monroe and Maeda’s two-dimensional esti...

متن کامل

Using Multidimensional Scaling for Assessment Economic Development of Regions

Addressing socio-economic development issues are strategic and most important for any country. Multidimensional statistical analysis methods, including comprehensive index assessment, have been successfully used to address this challenge, but they donchr('39')t cover all aspects of development, leaving some gap in the development of multidimensional metrics. The purpose of the study is to const...

متن کامل

Online News Media Bias Analysis using an LDA-NLP Approach

It is widely recognized that every media outlet has its own ”spin” on news, and this bias has been described in many ways and at many levels. In political news for example, the bias can be liberal, conservative, moderate, corporate, etc. In addition, recent research has focused on the ’sentiment dimension’ to further identify and categorize news bias. This is achieved through analysis of the ad...

متن کامل

Unsupervised Cross-Lingual Scaling of Political Texts

Political text scaling aims to linearly order parties and politicians across political dimensions (e.g., left-to-right ideology) based on textual content (e.g., politician speeches or party manifestos). Existing models scale texts based on relative word usage and cannot be used for cross-lingual analyses. Additionally, there is little quantitative evidence that the output of these models correl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016